We are migrating the bug tracker to github Issues. This is now the preferred way to report NASM bugs.
Self-registration is disabled due to spam issue (mail gorcunov@gmail.com or hpa@zytor.com to create an account)
Trying to build symbolic lDebugX using makex or makexd of the current symbolic branch of https://hg.pushbx.org/ecm/ldebug results in NASM being killed by the OS. (On an amd64 Linux Debian 10 server.) I was able to determine that a long define appears to be the cause of this behaviour. Here's a test case that works on all three tested NASM versions (TEST=0) or fails on the recent versions (TEST=10, which is the default): test$ cat test.asm %if 0 NASM bug test case 2021 by C. Masloch Usage of the works is permitted provided that this instrument is retained with the works, so that any entity that uses the works is notified of this instrument. DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY. %endif %define PATCH_386_TABLE "" ; Instruction if no 386+ CPU %macro _no386 0-1+.nolist %push %assign %$entry $+CODESECTIONFIXUP %1 ; write instruction %rep ($+CODESECTIONFIXUP) - %$entry ; count size of instruction %define PATCH_386_TABLE %[PATCH_386_TABLE],%[%$entry] ; write a patch for each byte %assign %$entry %$entry+1 %endrep %pop %endmacro org 0 %define CODESECTIONFIXUP -code_start+0 code_start: %ifndef TEST %assign TEST 10 %endif _no386 times 3030 + TEST nop %xdefine PATCH_386_TABLE PATCH_386_TABLE,5800,5801,5802,5815,5820,6091,6137,6165,6174,6180,6217,6233,6414,7328,7329,7330,7331,7488,7538,7636,8078,8393,8394,8395,8396,8397,8398,8399,8400,8401,8402,8403,8404,8405,8406,8407,8408,9227,9625,9644,9681,9711,9713,9716,9737,9740,9741,9742,9743,9744,9748,9751,9752,9753,9754,9755,9756,9757,9758,9942,10178,10179,10180,10181,10182,10183,10184,10185,10186,10187,10985,10997,11596,11638,11639,11640,11653,11654,11655,11656,11657,11658,11659,11660,11661,11910,11911,11912,11913,11914,11915,11916,11917,11918,11919,11960,13257,13265,13270,13292,13293,13294,13295,13296,13297,13298,13299,13300,13301,13302,13352,13416,13464,13465,13466,13469,13470,13471,13506,13524,13545,13564,13574,13575,13576,13577,13578,13579,13580,13605,13623,13625,13627,13629,13648,13650,13652,13654,13698,13699,13700,13701,13881,13923,13924,13925,13938,13939,13940,13941,13942,13943,13944,13945,13946,14712,14779,14780,14781,14782,14787,14788,14789,14790,14791,14792,14793,14794,14795,14796,14849,14854,15015,15058,15059,15060,15061,15078,15079,15080,15081,15082,15087,15088,15089,15165,15167,15169,15171,15178,15179,15180,15181,15182,15183,15184,15185,15186,15291,15301,15306,15307,15308,15309,15320,15321,15327,15328,15329,15330,15331,15332,15333,15334,15335,15336,15337,15338,15339,15340,15341,15342,15343,15344,15345,15346,15347,15348,15349,15350,15353,15428,15514,15522,15523,15524,15525,15526,15563,16232,16564,16565,16566,16567,16568,16569,16570,16571,16611,16612,16671,16676,16681,16704,16705,16706,16707,16708,16709,16739,16740,17147,17148,17158,17160,17193,17195,17546,17558,17720,17725,17728,17734,17772,17777,17782,17784,17794,17796,17798,17802,17805,17808,17811,17843,17846,17889,17928,17929,17930,17931,17932,17933,17934,17935,17936,17937,17938,18186,18187,18188,18189,18190,18191,18192,18193,18194,18505,18506,18507,18508,18509,18510,18511,18512,18513,18514,18515,18516,18517,18781,19274,19646,19651,19656,19683,19704,19705,19706,19707,19708,19709,19710,19711,19712,19713,19714,20009,20226,20273,20274,20278,20282,20286,20287,20291,20295,20299,20300,20304,20308,20312,20313,20317,20321,21598,21604,21631,22083,22084,22085,22086,22087,22088,22089,22090,22091,22092,22093,22094,22095,22096,22322,22325,22361,22803,22805,22810,22819,22821,22824,22831,22833,22838,22847,22849,22852,22859,22868,22878,22962,23071,23072,23073,23074,23075,23083,23084,23212,23222,23223,23224,23245,23255,23256,23257,23258,23259,23260,23270,23271,23272,23279,23286,23288,23299,23328,23329,23330,23336,23338,23350,23351,23352,23370,23371,23372,23373,23485,23492,23495,23496,23497,23508,23546,23555,23815,23816,23817,23818,23819,29190,29608,30573,30574,30575,30576,30577,30578,32975,32977,32979,32981,32983,32985,32987,33012,33014,33016,33018,33020,33022,33024,33172,33173,33174,33175,33176,33177,33178,33179,33180,33181,33182,33183,33184,33185,33186,33187,33188,33189,33190,33191,33271,33272,33273,33274,33275,33276,33277,33295,33299,33303,33307,33311,33315,33377,33378,33379,33380,33400,33404,33408,33412,33416,33420,33434,34013,34027,34030,34076,34079,34121,34576,34577,34578,34579,34580,34581,34582,34583,34584,34585,34586,34607,34608,34609,34610,34611,34612,34613,34614,34615,34616,34617,45818,46072,46075,46078,46080,46107,46110,46133,46134,46135,46136,46139,46140,46141,46145,46147,47058,47059,47060,47061,47377,47378,47379,47380,47474,47475,47476,47477,48272,48289,48292,48296,48299,48305,48307,48310,48313,48749,48759,48760,48761,48762,48796,50174,50183,50185,50227,50230,50265,50271,50272,50273,50274,50304,50305,50306,52079,52151,52164,52180,52188,52189,52190,52209,52281,52286,52306,52456,52459,52461,52476,52478,52481,52491,52502,52503,52504,52674,52675,52676,52708,52709,52710,52711,52712,52724,52725,52726,52727,52728,52742,52753,52758,52759,52760,52761,52762,52763,52764,52765,52766,52767,52768,52769,52770,52771,52772,52773,52774,52775,52776,52777,52778,52779,52780,52781,52782,52783,52784,52785,52786,52787,52788,52789,52803,52804,52805,52816,52822,52825,52852,52853,52854,52855,52903,52904,52905,52917,52919,52995,52996,52997,52998,52999,53000,53001,53002,53003,53658,53659,54547,54548,54682,54688,54689,54690,54691,54692,54693,54694,54695,54696,54697,54698,54699,54700,54701,54702,54703,54704,54705,54706,54707,54708,54709,54710,54711,54712,54713,54714,54715,54716,54717,56482,56499,56504,56980,56981,56982,56990,56991,56992,56995,57178,57179,57180,57229,57240,57241,57242 %macro count 0-* %warning %0 %endmacro count PATCH_386_TABLE %defstr string PATCH_386_TABLE %strlen length string %warning length test$ oldnasm -v NASM version 2.14.03rc2 compiled on Aug 31 2019 test$ oldnasm test.asm test.asm:56: warning: (count:1) 3776 [-w+user] test.asm:54: ... from macro `count' defined here [-w+user] test.asm:59: warning: 18442 [-w+user] test$ nasm -v NASM version 2.15.03 compiled on Dec 28 2020 test$ nasm test.asm Killed test$ newnasm -v NASM version 2.16rc0 compiled on Aug 8 2021 test$ newnasm test.asm Killed test$ oldnasm test.asm -DTEST=0 test.asm:56: warning: (count:1) 3766 [-w+user] test.asm:54: ... from macro `count' defined here [-w+user] test.asm:59: warning: 18392 [-w+user] test$ nasm test.asm -DTEST=0 test.asm:56: warning: 3766 [-w+user] test.asm:54: ... from macro `count' defined here test.asm:59: warning: 18392 [-w+user] test$ newnasm test.asm -DTEST=0 test.asm:56: warning: 3766 [-w+user] test.asm:54: ... from macro `count' defined here test.asm:59: warning: 18392 [-w+user] test$
We were able to verify that this crash is caused by the OS's OOM killer. (For some reason the server did not write log messages in the places that my partner expected to find them, which did not help us.) This is the same test case as initially, also available at https://pushbx.org/ecm/test/20220825/test.asm Here's the test log. The /usr/bin/time executable is GNU time. Its %M format code lists the maximum amount of KiB reserved to the process. test/20220825$ cat test.asm %if 0 NASM bug test case 2021 by C. Masloch Usage of the works is permitted provided that this instrument is retained with the works, so that any entity that uses the works is notified of this instrument. DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY. %endif %define PATCH_386_TABLE "" ; Instruction if no 386+ CPU %macro _no386 0-1+.nolist %push %assign %$entry $+CODESECTIONFIXUP %1 ; write instruction %rep ($+CODESECTIONFIXUP) - %$entry ; count size of instruction %define PATCH_386_TABLE %[PATCH_386_TABLE],%[%$entry] ; write a patch for each byte %assign %$entry %$entry+1 %endrep %pop %endmacro org 0 %define CODESECTIONFIXUP -code_start+0 code_start: %ifndef TEST %assign TEST 10 %endif _no386 times 3030 + TEST nop %xdefine PATCH_386_TABLE PATCH_386_TABLE,5800,5801,5802,5815,5820,6091,6137,6165,6174,6180,6217,6233,6414,7328,7329,7330,7331,7488,7538,7636,8078,8393,8394,8395,8396,8397,8398,8399,8400,8401,8402,8403,8404,8405,8406,8407,8408,9227,9625,9644,9681,9711,9713,9716,9737,9740,9741,9742,9743,9744,9748,9751,9752,9753,9754,9755,9756,9757,9758,9942,10178,10179,10180,10181,10182,10183,10184,10185,10186,10187,10985,10997,11596,11638,11639,11640,11653,11654,11655,11656,11657,11658,11659,11660,11661,11910,11911,11912,11913,11914,11915,11916,11917,11918,11919,11960,13257,13265,13270,13292,13293,13294,13295,13296,13297,13298,13299,13300,13301,13302,13352,13416,13464,13465,13466,13469,13470,13471,13506,13524,13545,13564,13574,13575,13576,13577,13578,13579,13580,13605,13623,13625,13627,13629,13648,13650,13652,13654,13698,13699,13700,13701,13881,13923,13924,13925,13938,13939,13940,13941,13942,13943,13944,13945,13946,14712,14779,14780,14781,14782,14787,14788,14789,14790,14791,14792,14793,14794,14795,14796,14849,14854,15015,15058,15059,15060,15061,15078,15079,15080,15081,15082,15087,15088,15089,15165,15167,15169,15171,15178,15179,15180,15181,15182,15183,15184,15185,15186,15291,15301,15306,15307,15308,15309,15320,15321,15327,15328,15329,15330,15331,15332,15333,15334,15335,15336,15337,15338,15339,15340,15341,15342,15343,15344,15345,15346,15347,15348,15349,15350,15353,15428,15514,15522,15523,15524,15525,15526,15563,16232,16564,16565,16566,16567,16568,16569,16570,16571,16611,16612,16671,16676,16681,16704,16705,16706,16707,16708,16709,16739,16740,17147,17148,17158,17160,17193,17195,17546,17558,17720,17725,17728,17734,17772,17777,17782,17784,17794,17796,17798,17802,17805,17808,17811,17843,17846,17889,17928,17929,17930,17931,17932,17933,17934,17935,17936,17937,17938,18186,18187,18188,18189,18190,18191,18192,18193,18194,18505,18506,18507,18508,18509,18510,18511,18512,18513,18514,18515,18516,18517,18781,19274,19646,19651,19656,19683,19704,19705,19706,19707,19708,19709,19710,19711,19712,19713,19714,20009,20226,20273,20274,20278,20282,20286,20287,20291,20295,20299,20300,20304,20308,20312,20313,20317,20321,21598,21604,21631,22083,22084,22085,22086,22087,22088,22089,22090,22091,22092,22093,22094,22095,22096,22322,22325,22361,22803,22805,22810,22819,22821,22824,22831,22833,22838,22847,22849,22852,22859,22868,22878,22962,23071,23072,23073,23074,23075,23083,23084,23212,23222,23223,23224,23245,23255,23256,23257,23258,23259,23260,23270,23271,23272,23279,23286,23288,23299,23328,23329,23330,23336,23338,23350,23351,23352,23370,23371,23372,23373,23485,23492,23495,23496,23497,23508,23546,23555,23815,23816,23817,23818,23819,29190,29608,30573,30574,30575,30576,30577,30578,32975,32977,32979,32981,32983,32985,32987,33012,33014,33016,33018,33020,33022,33024,33172,33173,33174,33175,33176,33177,33178,33179,33180,33181,33182,33183,33184,33185,33186,33187,33188,33189,33190,33191,33271,33272,33273,33274,33275,33276,33277,33295,33299,33303,33307,33311,33315,33377,33378,33379,33380,33400,33404,33408,33412,33416,33420,33434,34013,34027,34030,34076,34079,34121,34576,34577,34578,34579,34580,34581,34582,34583,34584,34585,34586,34607,34608,34609,34610,34611,34612,34613,34614,34615,34616,34617,45818,46072,46075,46078,46080,46107,46110,46133,46134,46135,46136,46139,46140,46141,46145,46147,47058,47059,47060,47061,47377,47378,47379,47380,47474,47475,47476,47477,48272,48289,48292,48296,48299,48305,48307,48310,48313,48749,48759,48760,48761,48762,48796,50174,50183,50185,50227,50230,50265,50271,50272,50273,50274,50304,50305,50306,52079,52151,52164,52180,52188,52189,52190,52209,52281,52286,52306,52456,52459,52461,52476,52478,52481,52491,52502,52503,52504,52674,52675,52676,52708,52709,52710,52711,52712,52724,52725,52726,52727,52728,52742,52753,52758,52759,52760,52761,52762,52763,52764,52765,52766,52767,52768,52769,52770,52771,52772,52773,52774,52775,52776,52777,52778,52779,52780,52781,52782,52783,52784,52785,52786,52787,52788,52789,52803,52804,52805,52816,52822,52825,52852,52853,52854,52855,52903,52904,52905,52917,52919,52995,52996,52997,52998,52999,53000,53001,53002,53003,53658,53659,54547,54548,54682,54688,54689,54690,54691,54692,54693,54694,54695,54696,54697,54698,54699,54700,54701,54702,54703,54704,54705,54706,54707,54708,54709,54710,54711,54712,54713,54714,54715,54716,54717,56482,56499,56504,56980,56981,56982,56990,56991,56992,56995,57178,57179,57180,57229,57240,57241,57242 %macro count 0-* %warning %0 %endmacro count PATCH_386_TABLE %defstr string PATCH_386_TABLE %strlen length string %warning length test/20220825$ nasm -v NASM version 2.16rc0 compiled on Aug 23 2022 test/20220825$ nasm test.asm -o /dev/null -DTEST=1000 Killed test/20220825$ /usr/bin/time --format="%M\n" nasm test.asm -o /dev/null -DTEST=1000 Command terminated by signal 9 3569956 test/20220825$ ~/proj/nasmtest/nasm -v NASM version 2.16rc0 compiled on Aug 25 2022 test/20220825$ ~/proj/nasmtest/nasm test.asm -o /dev/null -DTEST=1500 test.asm:56: warning: 5266 [-w+user] test.asm:54: ... from macro `count' defined here test.asm:59: warning: 25892 [-w+user] test/20220825$ /usr/bin/time --format="%M\n" ~/proj/nasmtest/nasm test.asm -o /dev/null -DTEST=1500 test.asm:56: warning: 5266 [-w+user] test.asm:54: ... from macro `count' defined here test.asm:59: warning: 25892 [-w+user] 3313900 test/20220825$ /usr/bin/time --format="%M\n" ~/proj/nasmtest/nasm test.asm -o /dev/null -DTEST=1000 test.asm:56: warning: 4766 [-w+user] test.asm:54: ... from macro `count' defined here test.asm:59: warning: 23392 [-w+user] 2623884 test/20220825$ The nasm executable is this revision: https://github.com/netwide-assembler/nasm/commit/3aebb20f123033dcd767f0abc46b18cbefed8091 With the following bugs patched: https://bugzilla.nasm.us/show_bug.cgi?id=3392732 https://bugzilla.nasm.us/show_bug.cgi?id=3392803 The ~/proj/nasmtest/nasm executable is based on the same revision with these additional bugs patched: https://bugzilla.nasm.us/show_bug.cgi?id=3392804 https://bugzilla.nasm.us/show_bug.cgi?id=3392805 The diff to patch only *this* bug fixed is as follows: diff --git a/asm/preproc.c b/asm/preproc.c index 0ff2b518..fed1cc39 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -162,7 +162,38 @@ static bool is_smac_param(enum token_type toktype) * is incorrect, as some token types strip parts of the string, * e.g. indirect tokens. */ +#if 0 #define INLINE_TEXT (7*sizeof(char *)-sizeof(enum token_type)-sizeof(unsigned int)-1) +#define TOKENPACKED +#elif 0 +/* + * The minimum size is enough to hold "%00" and ".nolist", + * as these are compared directly to the Token.text.a field. + * Further, to have Token.text.p.pad be at least one byte, + * INLINE_TEXT must be at least sizeof(char *) long which is + * equal to 8 for long mode. + */ +#define INLINE_TEXT 8 +/* + * If the structures aren't specified as packed the compiler + * will expand struct Token to 32 bytes regardless it appears. + * So to minimise memory usage, pack the structures. + */ +#define TOKENPACKED __attribute__((packed)) +#else +/* + * Setting the token structure size to 32 bytes appears to be + * sufficient to build the lDebug application, hg 7016dd710698, + * with the options -D_SYMBOLIC -D_DUALCODE -D_SYMBOLASMDUALCODE + * as well as -D_DEBUG -D_PM (lDDebugX build, with symbolic + * option and dual code segments). + * + * 64 bytes, the prior default for building the assembler for + * long mode, resulted in the assembler being OOM killed. + */ +#define INLINE_TEXT (32-sizeof(char *)-sizeof(enum token_type)-sizeof(unsigned int)-1) +#define TOKENPACKED +#endif #define MAX_TEXT (INT_MAX-2) struct Token { @@ -171,12 +202,12 @@ struct Token { unsigned int len; union { char a[INLINE_TEXT+1]; - struct { + struct TOKENPACKED { char pad[INLINE_TEXT+1 - sizeof(char *)]; char *ptr; } p; } text; -}; +} TOKENPACKED; /* * Note on the storage of both SMacro and MMacros: the hash table
It seems like this is the revision which introduced packing text inline into tokens: https://github.com/netwide-assembler/nasm/commit/8571f06061b47471a340e350fdfcd804098637d6 Before this, a token with short text (eg a comma, or a decimal number below "65536") would take up exactly as many bytes as were required to hold the text, as well as the token structure (including a pointer to the text). After this patch, each token would unconditionally take up at least 64 bytes (when the assembler is compiled for amd64 long mode). Ideally I'd like to create a patch that allows a run time selection of the token size (64 vs 32, as 32 bytes suffice to build lDDebugX symbolic) but that seems more complicated. Is there interest for such a patch? If no, can NASM unconditionally use 32 bytes for the token structure instead of the calculation that led to 64 bytes in long mode? Or should I patch and build NASM separately for my use case?
The inline text tokens don't appear to be the only pessimisation between older and recent NASM. Observe: ldebug/source$ nasm -v NASM version 2.16rc0 compiled on Aug 23 2022 ldebug/source$ /usr/bin/time --format="%M KiB" nasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE=1 -D_SYMBOLASMDUALCODE Command terminated by signal 9 3541284 KiB ldebug/source$ ~/proj/nasmtest/nasm -v NASM version 2.16rc0 compiled on Aug 25 2022 ldebug/source$ /usr/bin/time --format="%M KiB" ~/proj/nasmtest/nasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE=1 -D_SYMBOLASMDUALCODE asmtabs.asm:407: warning: Most assembler table prefix bytes: 1 (ofs 4h) mne BOXCB variant (240h + 0*8 + 7),85,, [-w+user] expr.asm:2843: warning: word data exceeds bounds [-w+number-overflow] init.asm:1432: warning: patch_no386_table: 946 (Method 2) [-w+user] init.asm:1432: warning: 1B=318 repo=46 run=426 byte=996 [-w+user] init.asm:1437: warning: patch_386_table: 50 (Method 2) [-w+user] init.asm:1437: warning: 1B=4 repo=11 run=13 byte=59 [-w+user] 2560268 KiB ldebug/source$ oldnasm -v NASM version 2.14.03rc2 compiled on Aug 31 2019 ldebug/source$ /usr/bin/time --format="%M KiB" oldnasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE=1 -D_SYMBOLASMDUALCODE 2>&1 | grep -v "warning: word data exceeds bounds" asmtabs.asm:407: warning: Most assembler table prefix bytes: 1 (ofs 4h) mne BOXCB variant (240h + 0*8 + 7),85,, [-w+user] init.asm:1432: warning: (writepatchtable:73) patch_no386_table: 946 (Method 2) [-w+user] init.asm:1432: warning: (writepatchtable:74) 1B=318 repo=46 run=426 byte=996 [-w+user] init.asm:1437: warning: (writepatchtable:73) patch_386_table: 50 (Method 2) [-w+user] init.asm:1437: warning: (writepatchtable:74) 1B=4 repo=11 run=13 byte=59 [-w+user] 714240 KiB ldebug/source$ Unpatched (nasm) gets OOM killed at 3.5 GiB. Patched runs to completion with 2.5 GiB. Older one needs less than 800 MiB. The resulting binary is identical.
Hmmm... this smells like a failure to reclaim storage to me. Basically malloc/free is likely to have the same kind of overhead as inlining the text, but if token heads aren't getting reused, this is a memory leak, and one which generic tools will not be able to see. So my strong guess is that either there is a missing token delete somewhere, or the token allocator fails to makes a deleted token head available for reuse.
> So my strong guess is that either there is a missing token delete somewhere, or the token allocator fails to makes a deleted token head available for reuse. Wouldn't know about the possibility of leaks. However, the token deletion appears to work as expected. Anyway, I made a crude patch to two different NASM revisions to compare their use of preprocessor tokens. The newer revision is as described here, based on the commit 3aebb20f123033dcd767f0abc46b18cbefed8091, and patched like this: diff --git a/asm/preproc.c b/asm/preproc.c index 7724b12a..203664b5 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -1717,6 +1717,19 @@ static Token *tokenize(const char *line) return list; } +static FILE * logfile = NULL; +static unsigned long logamount = 0; +static unsigned long logtotal = 0; +static unsigned long logmodulo = 0; + +static void openlogfile(void); +static void openlogfile(void) { + if (logfile) return; + logfile = fopen("nasmtoka.log", "wb"); + if (!logfile) nasm_panic("unable to open log file"); + return; +} + /* * Tokens are allocated in blocks to improve speed. Set the blocksize * to 0 to use regular nasm_malloc(); this is useful for debugging. @@ -1733,6 +1746,13 @@ static Token *tokenblocks = NULL; static Token *alloc_Token(void) { Token *t = freeTokens; + openlogfile(); + logamount++; + logtotal++; + if ((logmodulo++ & 8191) == 0) { + fprintf(logfile, "[%12lu] (%12lu) allocate\n", logamount, logtotal); + fflush(logfile); + } if (unlikely(!t)) { Token *block; @@ -1770,6 +1790,12 @@ static Token *alloc_Token(void) static Token *delete_Token(Token *t) { Token *next; + openlogfile(); + logamount--; + if ((logmodulo++ & 8191) == 0) { + fprintf(logfile, "[%12lu] (%12lu) delete\n", logamount, logtotal); + fflush(logfile); + } nasm_assert(t && t->type != TOKEN_FREE); Diff also available at https://pushbx.org/ecm/test/20220826/new.diff The older revision is based on commit 52266ad42490f48b91a70efb5c2f93ea281eeb60 and patched like this: diff --git a/asm/preproc.c b/asm/preproc.c index 95ca56fc..26cf3002 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -1192,6 +1192,19 @@ static void delete_Blocks(void) memset(&blocks, 0, sizeof(blocks)); } +static FILE * logfile = NULL; +static unsigned long logamount = 0; +static unsigned long logtotal = 0; +static unsigned long logmodulo = 0; + +static void openlogfile(void); +static void openlogfile(void) { + if (logfile) return; + logfile = fopen("nasmtoka.log", "wb"); + if (!logfile) nasm_panic(0, "unable to open log file"); + return; +} + /* * this function creates a new Token and passes a pointer to it * back to the caller. It sets the type and text elements, and @@ -1202,6 +1215,13 @@ static Token *new_Token(Token * next, enum pp_token_type type, { Token *t; int i; + openlogfile(); + logamount++; + logtotal++; + if ((logmodulo++ & 8191) == 0) { + fprintf(logfile, "[%12lu] (%12lu) allocate\n", logamount, logtotal); + fflush(logfile); + } if (!freeTokens) { freeTokens = (Token *) new_Block(TOKEN_BLOCKSIZE * sizeof(Token)); @@ -1229,6 +1249,12 @@ static Token *new_Token(Token * next, enum pp_token_type type, static Token *delete_Token(Token * t) { Token *next = t->next; + openlogfile(); + logamount--; + if ((logmodulo++ & 8191) == 0) { + fprintf(logfile, "[%12lu] (%12lu) delete\n", logamount, logtotal); + fflush(logfile); + } nasm_free(t->text); t->next = freeTokens; freeTokens = t; Diff also available at https://pushbx.org/ecm/test/20220826/old.diff The command run with either executable is as follows: nasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE -D_SYMBOLASMDUALCODE Again, the older revision needs 714356 KiB, the newer one needs 2560204 KiB. These are the ends of the resulting log files: ldebug/source$ tail nasmtoka.old [ 9739325] ( 378949151) delete [ 9738057] ( 378952613) allocate [ 9738415] ( 378956888) allocate [ 9738479] ( 378961016) delete [ 9738607] ( 378965176) delete [ 9738789] ( 378969363) allocate [ 9739287] ( 378973708) delete [ 9739839] ( 378978080) delete [ 9740393] ( 378982453) allocate [ 9740661] ( 378986683) allocate ldebug/source$ tail nasmtoka.log [ 71368223] ( 436961040) allocate [ 71368229] ( 436965139) allocate [ 71368351] ( 436969296) delete [ 71368459] ( 436973446) delete [ 71369195] ( 436977910) delete [ 71369851] ( 436982334) allocate [ 71370409] ( 436986709) delete [ 71370513] ( 436990857) allocate [ 71363959] ( 436991676) delete [ 71355767] ( 436991676) delete ldebug/source$ The square bracketed numbers show the currently-allocated amount of tokens, the round parenthetical numbers show the total-allocated amount of tokens (counting only allocation, not deletion). So it does appear that the newer revision leaks more than the older. I'm not sure whether this explains the dramatic memory use increase however.
I used git bisect on the NASM repo, running the following scriptlet to build and test NASM: $ git clean -x -d -f; touch config/undef.h; ./autogen.sh; ./configure; make; git checkout autoconf; /usr/bin/time --format="%M KiB" ./nasm ~/wwwecm/test/20220825/test.asm This test resulted in less than 7 MiB of memory use for good revisions, more than 2.8 GiB for bad revisions. I started with https://github.com/netwide-assembler/nasm/commit/52266ad42490f48b91a70efb5c2f93ea281eeb60 as the good revision and https://github.com/netwide-assembler/nasm/commit/3aebb20f123033dcd767f0abc46b18cbefed8091 as the bad revision. First bad revision is https://github.com/netwide-assembler/nasm/commit/de7acc3a46cb3da52464d246b814f8bf059a0360 de7acc3a46cb3da52464d246b814f8bf059a0360 is the first bad commit commit de7acc3a46cb3da52464d246b814f8bf059a0360 Author: H. Peter Anvin (Intel) <hpa@zytor.com> Date: Mon Aug 19 17:52:55 2019 -0700 preproc: defer %00, %? and %?? expansion for nested macros, cleanups BR 3392603: When doing nested macro definitions, we need %00, %? and %?? expansion to be deferred to actual expansion time, just as the other parameters. Do major cleanups to the mmacro expansion code. Reported-by: Alexandre Audibert <alexandre.audibert@outlook.fr> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> asm/preproc.c | 713 ++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 400 insertions(+), 313 deletions(-)
Here's the fix. I don't understand the dup_tlist behaviour enough (yet) to avoid using it, but adding free_tlist fixes the memory leak. diff --git a/asm/preproc.c b/asm/preproc.c --- a/asm/preproc.c +++ b/asm/preproc.c @@ -5333,8 +5333,9 @@ static Token *expand_mmac_params(Token * tlin e) tt = tokenize(tok_text(t)); tt = expand_mmac_params(tt); tt = expand_smacro(tt); - /* Why dup_tlist() here? We should own tt... */ + /* *tail = tt; */ dup_tlist(tt, &tail); + free_tlist(tt); text = NULL; change = true; break;
Better patch, recreating the behaviour of dup_tlist then free_tlist without actually duplicating the tokens: diff --git a/asm/preproc.c b/asm/preproc.c --- a/asm/preproc.c +++ b/asm/preproc.c @@ -5329,12 +5329,17 @@ static Token *expand_mmac_params(Token * tl ine) case TOKEN_INDIRECT: { Token *tt; + Token *teach; tt = tokenize(tok_text(t)); tt = expand_mmac_params(tt); tt = expand_smacro(tt); - /* Why dup_tlist() here? We should own tt... */ - dup_tlist(tt, &tail); + *tail = tt; + list_for_each(teach, tt) { + tail = &teach->next; + } + /* dup_tlist(tt, &tail); + free_tlist(tt); */ text = NULL; change = true; break;
Patch could probably use list_last instead: https://github.com/netwide-assembler/nasm/blob/3aebb20f123033dcd767f0abc46b18cbefed8091/include/nasmlib.h#L282 But that's just optimisation.
Fix checked in. Huge thanks for tracking this down!